Practical Scalable Consensus for Pseudo-Synchronous Distributed Systems: Formal Proof

نویسندگان

  • Thomas Herault
  • Aurelien Bouteiller
  • George Bosilca
  • Marc Gamell
  • Keita Teranishi
  • Manish Parashar
  • Jack Dongarra
چکیده

The ability to consistently handle faults in a distributed environment requires, among a small set of basic routines, an agreement algorithm allowing surviving entities to reach a consensual decision between a bounded set of volatile resources. This paper presents an algorithm that implements an Early Returning Agreement (ERA) in pseudo-synchronous systems, which optimistically allows a process to resume its activity while guaranteeing strong progress. We prove the correctness of our ERA algorithm, and expose its logarithmic behavior, which is an extremely desirable property for any algorithm which targets future exascale platforms. We detail a practical implementation of this consensus algorithm in the context of an MPI library, and evaluate both its efficiency and scalability through a set of benchmarks and two fault tolerant scientific applications.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Formal Verification of Consensus Algorithms Tolerating Malicious Faults

Consensus is the paradigmatic problem in fault-tolerant distributed computing: it requires network nodes that communicate by message passing to agree on common value even in the presence of (benign or malicious) faults. Several algorithms for solving Consensus exist, but few of them have been rigorously verified, much less so formally. The Heard-Of model proposes a simple, unifying framework fo...

متن کامل

A Consensus Algorithm for Synchronous Distributed Systems using Mobile Agent

In this paper, we present a consensus algorithm for synchronous distributed systems using cooperating mobile agents. The algorithm is designed within a framework for mobile agent enabled distributed server groups (MADSG), where cooperating mobile agents are used to achieve coordination among the servers. Being autonomous and cooperative, cooperating mobile agents exchange information among them...

متن کامل

Formalization and Correctness of the PALS Pattern for Asynchronous Real-Time Systems

Due to physical requirements, what in essence and at a higher level of abstraction is a logically synchronous real-time system has to be often realized as a distributed, asynchronous system. Getting asynchronous real-time systems right is a very error prone and labor-intensive task. The Physically Asynchronous Logically Synchronous (PALS) architectural pattern can greatly reduce the design and ...

متن کامل

A Case Study of Agreement Problems in Distributed Systems: Non-Blocking Atomic Commitment

This paper considers an agreement problem whose practical interest is well known, namely the Non-Blocking Atomic Commitment Problem. First, a generic protocol solving this problem is given and then instantiations of its generic statements are provided for both synchronous and asynchronous distributed systems. These instantiations use a few basic components: timeout mechanism and reliable multic...

متن کامل

Anonymous Byzantine Consensus from Moderately-Hard Puzzles: A Model for Bitcoin

We present a formal model of synchronous processes without distinct identifiers (i.e., anonymous processes) that communicate using one-way public broadcasts. Our main contribution is a proof that the Bitcoin protocol achieves consensus in this model, except for a negligible probability, when Byzantine faults make up less than half the network. The protocol is scalable, since the running time an...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015